Spectral Clustering for Complex Settings
نویسندگان
چکیده
of the Dissertation Spectral Clustering for Complex Settings Many real-world datasets can be modeled as graphs, where each node corresponds to a data instance and an edge represents the relation/similarity between two nodes. To partition the nodes into different clusters, spectral clustering is used to find the normalized minimum cut of the graph (in the relaxed sense). As one of the most popular clustering schemes, spectral clustering is limited to a single graph. However, in practice, we often need to collectively consider rich information generated from multiple heterogeneous sources, e.g. scientific data (fMRI scans of different individuals), social data (different types of relationship among different people), and web data (multi-type contents). Such complex datasets demand complex graph models. In this dissertation, we explore novel formulations to extend spectral clustering to a variety of complex graph models and study how to apply them to real-world problems. We start with incorporating pairwise constraints into spectral clustering, which extends spectral clustering from unsupervised setting to semi-supervised setting. Then we further extend our constrained spectral clustering formulation from passive learning to active learning. We justify the effectiveness of our approach by exploring its link to a classic graph-based semi-supervised learning technique, namely label propagation. Finally we study how to extend spectral clustering to the multi-view learning setting. Our proposed algorithms were not only tested on benchmark datasets but also successfully applied to real-world applications, such as machine translation aided document clustering and resting-state fMRI analysis.
منابع مشابه
Robust path-based spectral clustering
Spectral clustering and path-based clustering are two recently developed clustering approaches that have delivered impressive results in a number of challenging clustering tasks. However, they are not robust enough against noise and outliers in the data. In this paper, based on M-estimation from robust statistics, we develop a robust path-based spectral clustering method by defining a robust pa...
متن کاملPrediction of the structural and spectral properties for L,L-ethylenedicysteine diethylester (EC) and its complex with Technetium-99m radionuclide
The technetium-99m complex of the L,L-ethylenedicysteine diethylester (EC), of the brain imaging agent, was reported as a good choice for replacement of the renal nuclear medicines like OIH radiopharmaceutical. This present research work studies the structural, electronic and spectral properties of the EC compound and its complex with technetium-99m radionuclide from theoretical insight. All co...
متن کاملSpectral Clustering with Perturbed Data
Spectral clustering is useful for a wide-ranging set of applications in areas such as biological data analysis, image processing and data mining. However, the computational and/or communication resources required by the method in processing large-scale data are often prohibitively high, and practitioners are often required to perturb the original data in various ways (quantization, downsampling...
متن کاملWeighted Ensemble Clustering for Increasing the Accuracy of the Final Clustering
Clustering algorithms are highly dependent on different factors such as the number of clusters, the specific clustering algorithm, and the used distance measure. Inspired from ensemble classification, one approach to reduce the effect of these factors on the final clustering is ensemble clustering. Since weighting the base classifiers has been a successful idea in ensemble classification, in th...
متن کاملSpectral Clustering in Educational Data Mining
Spectral Clustering is a graph theoretic technique to represent data in such a way that clustering on this new representation is reduced to a trivial task. It is especially useful in complex datasets where traditional clustering methods would fail to find groupings. In previous work we have shown the utility of using K-means clustering for exploiting structure in the data to affect a significan...
متن کامل